6 research outputs found

    Scaling DBSCAN-like algorithms for event detection systems in Twitter

    Get PDF
    The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.Peer ReviewedPostprint (author's final draft

    Implementación del algoritmo RX para la detección de anomalias en imágenes hiperespectrales de la superficie terrestre mediante hardware reconfigurable

    Get PDF
    En los últimos años, el análisis de imágenes hiperespectrales en observación remota, se ha convertido en un campo de investigación muy activo. Actualmente, debido a la resolución espacial disponible en los sensores presentes en plataformas aéreas o satelitales y a la manera en que aparecen los distintos materiales en la naturaleza, en una imagen hiperespectral podemos encontrar dos tipos de píxeles:puros y mezclas. La diferencia entre ellos es que su firma espectral se corresponde con un único material (puro) o es la combinación de varios de ellos (mezcla).Para realizar el análisis de la composición de cada uno de los píxeles de la imagen hiperespectral se utiliza frecuentemente la técnica de desmezclado espectral. En determinadas situaciones no nos interesa realizar un análisis completo de la imagen, generalmente por el excesivo tiempo de ejecución que ello conllevaría, y es suficiente aplicar técnicas de análisis consistentes en la detección de anomalías. Para realizar dicha detección, existe un enfoque desarrollado por Reed y Yu, el cual es conocido comúnmente como algoritmo RX. Es en esta última técnica en la que se basa este trabajo. En este proyecto fin de carrera se ha diseñado e implementado el algoritmo RX completo a excepción de la obtención de la matriz de covarianza, la cual ya se supondrá calculada. Para la especificación de dicha implementación se ha empleado el lenguaje de descripción hardware VHDL y se ha orientado a su uso en plataformas de hardware reconfigurable del tipo Field Programmable Gate Array (FPGA). [ABSTRACT] In the latest years, remotely sensed hyperspectral imaging has become a very active research field. Nowadays, due to the available spatial resolution of the sensors in remote sensing of the Earth and the way the different materials appear in nature, in a hyperspectral image can be found two different types of pixels: pures and mixtures. The difference between them is that its spectral sign corresponds to a unique material (pure ones) or is the blend of some of them (mixture ones). To perform the analysis of each pixel’s composition of the hyperspectral image it is frequently used the technique of spectral unmixing. In certain situations, it is not of interest to perform a whole analysis of the image, usually due to the excessive runtime that will take, and is enough with only applying techniques of anomaly detection. A well-known approach for anomaly detection was developed by Reed and Yu, and is referred to as the RX algorithm. It is in this technique in what it is based this work. In this project it has been designed and implemented the whole RX algorithm but the covariance matrix, which will be assumed as given. The design proposed has been developed in reconfigurable platforms like Field Programmable Gate Arrays (FPGAs) using the VHDL language for its specification

    Retroespective event detection from tweets thorugh MR-DBSCAN in Spark

    No full text
    Messages posted on Location-Based Social Networks (LBSNs) such as Twitter have been reporting everything from daily life stories to the latest local and global news. Monitoring and analyzing this rich and continuous user-generated content can yield unprecedentedly valuable information, enabling users and organizations to acquire priceless knowledge of the occurrences and events. As the size of user-data generated content is extremely large nowadays, parallel processing of complex data analysis becomes essential. In this context, we propose OctreeDBSCAN 3D, an event discovery technique based on a scalable and distributed implementation of the popular density-based clustering algorithm called DBSCAN. First, we evaluate the performance of the proposed algorithm with respect to previous works, achieving speedups of 30 in the improved phases, and second we evaluate the correctness of the clustering performed in real collected data, from which some tagged events were detected. In addition, the tweeting activity varies a lot within geographically large regions and temporarily long periods, limiting the use of global parameters needed for DSCAN-like algorithms. With that in mind, we also present a novel density density-aware MapReduce scheme that partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. The results from evaluating this scheme pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy

    Retroespective event detection from tweets thorugh MR-DBSCAN in Spark

    No full text
    Messages posted on Location-Based Social Networks (LBSNs) such as Twitter have been reporting everything from daily life stories to the latest local and global news. Monitoring and analyzing this rich and continuous user-generated content can yield unprecedentedly valuable information, enabling users and organizations to acquire priceless knowledge of the occurrences and events. As the size of user-data generated content is extremely large nowadays, parallel processing of complex data analysis becomes essential. In this context, we propose OctreeDBSCAN 3D, an event discovery technique based on a scalable and distributed implementation of the popular density-based clustering algorithm called DBSCAN. First, we evaluate the performance of the proposed algorithm with respect to previous works, achieving speedups of 30 in the improved phases, and second we evaluate the correctness of the clustering performed in real collected data, from which some tagged events were detected. In addition, the tweeting activity varies a lot within geographically large regions and temporarily long periods, limiting the use of global parameters needed for DSCAN-like algorithms. With that in mind, we also present a novel density density-aware MapReduce scheme that partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. The results from evaluating this scheme pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy

    Scaling DBSCAN-like Algorithms for Event Detection Systems in Twitter

    No full text
    The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.Peer Reviewe

    Scaling DBSCAN-like algorithms for event detection systems in Twitter

    No full text
    The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.Peer Reviewe
    corecore